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Abstract 


This  paper  presents  a  structure-from-motion  system  which  delivers  dense  structure 
information  from  a  sequence  of  dense  optical  flows.  Most  traditional  feature-based 
approaches  cannot  be  extended  to  compute  dense  structure  due  to  impractical  com¬ 
putational  complexity.  We  demonstrate  that  by  decomposing  uncertainty  information 
into  independent  and  correlated  parts  we  can  decrease  these  complexities  from  0(N2) 
to  O(N),  where  N  is  the  number  of  pixels  in  the  images.  We  also  show  that  this  dense 
structure-from-motion  system  requires  only  local  optical  flows,  i.e.  image  matchings 
between  two  adjacent  frames,  instead  of  the  tracking  of  features  over  a  long  sequence 
of  frames. 


Ill 


1  Introduction 


Structure  from  motion  has  been  one  of  the  most  active  areas  in  computer  vision 
during  the  past  decade.  The  idea  is  to  recover  structure  or  shape  information  from 
a  sequence  of  images  taken  under  unknown  relative  motions  between  the  camera  and 
the  scene.  Most  approaches  proposed  in  the  literature  can  be  classified  according  to 
whether  they  are  based  upon  features  or  optical  flows. 

Feature-based  methods  compute  the  relative  structure  information  among  features 
by  analyzing  their  2D  motion  in  images.  Examples  of  such  systems  are  reported  by 
Tsai  k  Huang  [19],  Tomasi  k  Kanade  [18],  Broida  et  al  [5]  and  Azarbayejani  k 
Pentland  [3]  .  Because  the  whole  analysis  is  limited  to  features  which  usually  number 
not  more  than  hundreds,  the  results  from  those  systems  yield  very  sparse  shape 
information.  While  stripping  a  full-resolution  image  to  a  handful  of  features  may 
greatly  simplify  the  algorithm  and  the  computation,  most  of  information  contained 
in  the  image  is  lost.  In  many  applications  such  as  model  acquisition,  inspection  and 
navigation,  dense  structural  information  is  more  desirable. 

Traditional  flow-based  methods,  such  as  reported  by  Bruss  k  Horn  [6],  Weng  et  al 
[20],  Heeger  k  Jepson  [11],  Adiv  [1],  have  concentrated  on  either  solving  the  problem 
of  recovering  motion  and  structure  from  a  single  optical  flow  field  or  using  very  low 
resolution  optical  flows.  As  far  as  we  know,  little  has  been  done  to  achieve  a  dense 
structure-from- motion  system  except  Heel’s  work  in  [13].  Unfortunately,  as  pointed 
out  in  [18]  that  whether  the  proposed  iterative  algorithm  in  [18]  converges  is  still  an 
open  question.  Overall,  the  difficulties  of  such  a  system  arise  from  two  main  factors: 

•  Computation.  While  a  feature-based  method  can  easily  afford  an  0(N 2)  or 
0(N3)  algorithm,  where  N  is  the  number  of  features,  a  flow-based  method  cannot 
even  afford  an  0(N2)  algorithm,  where  N  is  the  number  of  pixels. 

•  Accumulation.  While  a  feature-based  method  can  accumulate  structural  infor¬ 
mation  for  features  because  they  are  tracked  across  many  frames,  optical  flows 
usually  cannot  be  used  to  track  pixels  because  their  measurements  are  uncertain. 
In  other  words,  while  a  feature-based  approach  quantify  the  image  information 
as  either  totally  unreliable  or  very  reliable,  a  flow-based  approach  has  to  use 
a  spectrum  of  reliability.  Therefore,  it  is  impossible  to  accumulate  structural 
information  by  tracking  all  pixels  across  many  frames  in  a  flow-based  method. 

This  paper  shows  our  attempt  at  overcoming  these  difficulties.  We  demonstrate  a 
system  which  incrementally  accumulates  dense  structural  information  from  a  sequence 
of  optical  flows.  The  system  has  the  following  features: 

•  The  system  is  based  on  EKF  (extended  Kalman  filtering)  as  proposed  in  [5].  We 
will  show  in  our  experiment  that  the  nonlinearity  problem  is  actually  not  very 
serious  even  when  the  initial  data  are  very  crude. 

•  The  formulation  of  the  structure  from  motion  uses  separate  independent  and 
correlated  structure  uncertainty  estimations.  By  employing  the  separation  and 
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Figure  1:  The  Camera  Coordinate 


other  mathematical  techniques  such  as  Sherman-Morrison- Woodbury  inversion 
and  principal  component  analysis,  we  can  achieve  an  0(N)  numerical  algorithm 
to  compute  Kalman  filtering. 

•  The  underlying  motion  of  the  camera  can  be  discontinuous.  Unlike  many  EKF- 
based  approaches,  ours  computes  the  initial  motion  of  every  frame  independently. 

•  We  propose  the  concept  of  “Dynamic  Motion  Parameterization”,  which  means 
that  a  different  parameterization  of  the  six  motion  parameters  is  used  at  every 
frame.  Such  a  dynamic  parameterization  enables  that  the  optical  flow  is  equally 
sensitive  to  each  of  them,  and  therefore,  stabilizes  numerical  computations. 

2  System  Overview 

2.1  Coordinates  and  Motion 

The  system  is  based  on  the  camera  coordinate  system  OXYZ  shown  in  Figure  1,  in 
which  the  origin  O  is  the  center  of  projection,  the  Z  axis  coincides  with  the  optical 
axis,  and  the  image  plane  is  located  at  Z  =  1. 

If  the  relative  motion  of  the  camera  with  respect  to  the  scene  is  composed  of  a 
translation  velocity  ( U ,  V,  W)  and  a  rotation  velocity  (A,  B,  C ),  we  have  the  following 
relation  between  the  flow  velocity  (vx,vy)  and  the  depth  Z  of  pixel  location  (x,  y) 
from  [14]: 

%  =  C  + xW7  +  Axy  -  B( x2  +  1)  +  Cy,  (1) 

v,  =  =XtyK-Bxy  +  A(y2  +  l)-Cx.  (2) 

If  we  designate  the  camera  motion  parameterization  as  Mo  =  (U,V,W,A,B,  C)T  and 
the  flow  velocity  as  v  =  (vx,vy)T,  the  above  equation  can  be  expressed  as 

v  =  v(x,y,Z,M0),  (3) 
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or  its  inverse 


Z  =  Z(v,x,y,M0). 


(4) 


2.2  Block  Diagram  of  the  System 

Functionally,  the  system  is  decomposed  into  three  major  blocks  as  in  Figure  2.  For  the 
sake  of  simplicity,  we  will  refer  to  an  optical  flow  and  its  uncertainty  together  as  optical 
flow  information,  the  structure  and  its  uncertainty  together  as  structural  information, 
and  the  motion  and  its  uncertainty  together  as  motion  information.  We  also  designate 
the  camera  coordinate  system  before  current  motion  as  a  priori  coordinate  system  and 
the  camera  coordinate  system  after  current  motion  as  posteriori  coordinate  system. 
The  computations  within  each  block  are  as  follows. 

•  Initial  Motion  Estimate:  This  block  uses  the  current  optical  flow  information 
and  predicted  structure  information  to  compute  an  initial  estimate  of  motion 
information  for  the  current  frame.  Since  the  motion  can  be  discontinuous,  the 
current  motion  is  independent  of  the  previous  motions.  Once  the  motion  infor¬ 
mation  is  estimated,  we  can  re-parameterize  the  motion  parameters  such  that 
they  are  equally  sensitive  to  flow  variations. 

•  EKF-based  Update:  This  block  uses  the  current  flow  information,  predicted  struc¬ 
ture  information  and  initial  motion  information  to  compute  posteriori  structural 
and  motion  information.  The  structure  is  represented  with  respect  to  the  a  priori 
coordinate  system. 

•  Interpolation  and  Transformation:  This  block  converts  the  structural  informa¬ 
tion  from  the  a  priori  coordinate  system  into  the  posteriori  coordinate  system 
by  interpolation,  spatial  rotations  and  translations. 


3  EKF-based  Uncertainty  Update 

The  Extended  Kalman  Filtering  approach  has  been  applied  successfully  in  many  fields 
to  combine  uncertainty  information.  In  this  section,  we  will  briefly  go  through  the 
general  EKF  framework  as  in  [8]  (Chapter  6),  and  then  apply  this  framework  to  our 
nonlinear  problem. 

The  measurements  z  and  the  state  vector  x,  which  is  what  we  need  to  estimate, 
are  related  according  to: 

z  —  h(x )  +  n,  (5) 

where  n  is  a  zero-mean  measurement  error  whose  covariance  is  R.  If  the  a  priori 
estimates  of  the  state  vector  and  its  covariance  are  x_  and  P_  respectively,  the 


3 


Figure  2:  Block  Diagram  of  The  System 


posteriori  estimates  of  the  state  vector  and  its  covariance  after  combining  the  new 
measurements  are  x+  and  P+: 

x+  —  X-  +  K(z  —  h(x-)),  (6) 

P+  =  (I  —  KH)P_,  (7) 

where  K  is  the  Kalman  gain 

K  =  P_HT(HP_Hr  +  R)-\  (8) 


and  H  is  the  Jacobian  matrix  of  the  measurement  equations,  i.e. 


H  = 


In  the  problem  of  dense  structure  from  optical  flows,  the  state  vector  is  an  (N  + 
6)  vector  composed  of  six  motion  parameters  M  and  depth  at  every  pixel  Zi,i  = 
1,2, ...,  JV,  where  N  is  the  number  of  pixels.  We  express  the  state  vector  as 

x  =  (Mt,ZuZ2,...,Zn)t.  (10) 

Note  that  the  motion  parameters  M  have  a  linear  relation  with  the  original  motion 
parameters  M0  in  Eq.  3  as  we  will  show  later,  i.e. 

Mo  =  TM,  (11) 


where  T  is  a  6  x  6  non-singular  matrix. 

The  measurements  are  stored  in  a  2 N  vector  which  represents  flow  velocities  at 
every  pixel,  i.e. 

' . '  (12) 


>T  \T 


2=  (Z1,Z2,...,ZN) 
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where  Zi,  i  =  1,2, N  is  the  measured  flow  velocity  pari  at  each  pixel.  And  the  error 
covariance  of  the  measurement  vector  z  is  a  2 N  x  2 N  diagonal  block  matrix 


(  ri 


R  = 


r2 


rjv  / 


(13) 


in  which  r \,i  =  1, 2, N  is  the  2x2  error  covariance  matrix  of  the  flow  velocity  at 
each  pixel. 

Using  Eq.  3  and  Eq.  11,  the  measurement  equations  can  be  expressed  as 


in  which  x,,  yi,i  = 
ment  equations  is 


v{x1,yuZl,TM)  \ 

(  Vi  > 

z  = 

v{x2,y2,Z2,TM) 

— 

v2 

\  v(xw,yN,  Zn,TM)  ) 

VN  / 

1,2,  ...,1V  and  T  are  known.  The  Jacobian 


■  (14) 

matrix  of  the  measure- 


/ 


H  = 


9M 


dv<i 

dZ1 

dv2 

dz2 


\ 


\ 

=  (A  S). 

dviv 

dZN  / 


(15) 


in  which  A  is  a  2 TV  x  6  matrix  and  S  is  an  TV  x  TV  diagonal  block  matrix  with  each 
block  a  2  x  1  matrix. 

Now  that  we  have  formulated  the  problem  of  recovering  dense  structure  from  op¬ 
tical  flows  in  the  EKF  framework,  it  seems  like  all  we  need  to  do  is  to  plug  those 
formulas  into  Eq.  6  and  Eq.  7  so  that  the  dense  structure  information  can  be  recur¬ 
sively  estimated.  And  that  is  exactly  what  people  did  in  feature-based  methods  such 
as  [5]  and  [3].  Unfortunately,  if  we  apply  this  scheme  directly  to  the  dense  structure 
recovery  problem,  the  computation  and  memory  requirements  are  insurmountable. 
As  pointed  out  in  [17],  the  uncertainties  of  the  depth  values  Z,-,i  =  1, 2, ...,  TV  are  cor¬ 
related  due  to  uncertain  motion.  Thus  the  covariance  matrix  P  is  a  full  TV  x  TV  matrix. 
And  the  computation  of  the  Kalman  gain  in  Eq.  8  which  contains  an  inverse  of  a  full 
2 TV  x  2 N  matrix  requires  at  least  0(N 2)  computation  and  memory.  Considering  an 
ordinary  256  x  256  image,  even  a  symmetric  2 N  x  2 N  (here  N  —  256  x  256  =  65, 536) 
matrix  in  single  precision  will  require  more  than  30  Gigabytes  of  memory!  Even  if  we 
could  represent  such  a  matrix,  it  is  impractical  to  consider  inverting  it  on  ordinary 
workstations. 
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3.1  Decomposition  of  Independent  and  Correlated  Uncer¬ 
tainty 

Fortunately,  we  can  take  advantage  of  this  specific  problem  to  overcome  these  difficul¬ 
ties.  As  we  mentioned  before,  the  uncertainties  of  the  depth  values  are  correlated  due 
to  uncertain  motion.  Because  there  are  only  six  motion  parameters,  the  correlated 
uncertainty  of  the  depth  values  caused  by  a  single  uncertain  motion  is  an  N  x  N 
matrix  with  rank  of  only  six! 

Because  the  rank  of  the  correlated  uncertainty  is  much  smaller  than  N,  the  co- 
variance  matrix  P  can  be  decomposed  into  the  following  format: 


p  _  /  C„  C I  \ 

r  A  C„  (C,+UVT)  J' 


(16) 


where  Cm  is  a  6  x  6  matrix  representing  the  covariance  of  the  motion  parameters,  C; 
is  an  N  x  6  matrix  representing  the  correlation  between  the  motion  and  the  structure, 
Cs  is  an  NxN  diagonal  matrix  representing  the  independent  uncertainty  of  the  depth 
value  of  each  pixel,  and  U  and  V  are  both  N  xk  matrices  whose  outer  product  is  a 
rank  k  matrix  representing  the  correlated  uncertainty  of  the  depth  values.  Therefore, 
storing  the  matrix  P  sparsely  will  only  require  0 (N)  memory  if  k  is  a  constant. 

Now  that  we  can  represent  the  covariance  matrix  P,  we  will  show  that  in  the 
EKF  framework,  once  P_  can  be  represented  in  the  format  of  Eq.  16,  P+  can  also 
be  represented  in  the  same  format.  In  fact,  because  R  and  H  are  special  matrices, 
the  covariance  matrix  P  can  always  be  represented  sparsely  as  in  Eq.  16  throughout 
the  whole  optical  flow  sequence.  We  never  need  to  explicitly  represent  P  as  an 
(N  +  6)  x  (N  +  6)  matrix! 

In  our  system,  we  assume  that  the  motion  is  discontinuous,  i.e.  the  current  motion 
is  uncorrelated  to  previous  motions.  Under  this  assumption,  a  priori  correlation 
between  the  structure  and  the  current  motion  CP  in  Eq.  16  is  zero.  For  simplicity, 
we  will  assume  Cp  is  zero  in  the  following  sections,  though  in  situations  where  this 
assumption  is  not  true  we  also  have  similar  results.  If  P_  is  represented  as  in  Eq.  16, 
after  some  manipulation,  we  have 


HP_Ht  +  R  =  (SCsSt  +  R)  + 


ACm 


=  c1+uavf, 


at 

VTST 


(17) 


where  Ci  =  (SCsST  +  R)  is  an  NxN  diagonal  block  matrix  with  each  block  a  2  x  2 
matrix,  Ux  and  Vx  are  27V  x  (k  +  6)  matrices  as 


J 

(  \  1 

(  \ 

II 

p 

ACm  SU  ,Vi  = 

A  SV 

l  J  ' 

^  / 

(18) 
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By  applying  the  Sherman-Morrison- Woodbury  formula  as  in  [9]  and  Appendix  A, 
we  can  invert  the  above  matrix 


(HP_Ht  +  R)-1  =  (Ci  +  UiVf)"1  =  C2  +  U2V^,  (19) 


where  C2  is  also  an  N  x  N  diagonal  block  matrix  with  each  block  a  2  x  2  matrix,  U2 
and  Vi  are  2N  x  (k  +  6)  matrices. 

Substituting  Eq.  19  back  into  Eq.  8,  we  obtain  the  Kalman  gain 


K  = 


K 


m 


c3  +  u3v3t 


(20) 


where  Km  is  a  6  x  2N  matrix 


Km  =  CmATC2  +  CmATU2V^,  (21) 

C3  is  an  N  x  N  diagonal  block  matrix  with  each  block  a  1  x  2  matrix 

C3  =  CsStC2,  (22) 

U3  is  a  (N  +  6)  x  (3 k  -f  6)  matrix 


U, 


/  \ 

u  c,stu2  u 


V 


(23) 


/ 


and  V3  is  a  2 N  x  (3 k  +  6)  matrix 

/ 


v3  =  ^  C2SV  V2  V2(Ui’SV) 

Finally,  the  updated  covariance  P+  is 


(24) 


p  _  \  Cmp 


CT 

pp 


Cpp  (C4  +  U4vj)  )  ’ 

where  Cmp  is  a  6  x  6  posteriori  covariance  matrix  of  the  motion  parameters 

^-1 mp  —  Cm  KmACm, 


(25) 


(26) 


Cpp  is  an  N  x  6  posteriori  uncertainty  correlation  between  the  structure  and  current 
motion1 

Cpp  =  — (Cs  +  UVt)StK^,  (27) 


1Note  that  we  assumed  a  priori  correlation  between  structure  and  motion  Cp  is  zero.  But  the 
posteriori  correlation  Cpp  is  not  zero. 
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C4  is  an  N  x  N  diagonal  matrix  representing  the  independent  uncertainty  in  the 
structure  estimation 

C4  =  c,  -  C3SCs,  (28) 

and  U4  and  V4  are  N  x  (6k  +  6)  matrices,  whose  outer  product  represents  the  cor¬ 
related  uncertainty  in  the  structure  information 


U4 

V4 


(  U  -U3  -C.SIJ  -UjVJSU 
V  C,SrV3  V  V  j  . 


(29) 

(30) 


If  we  are  careful  about  the  ordering  of  matrix  multiplications  in  the  above  equa¬ 
tions,  we  then  have  an  algorithm  which  updates  the  state  vector  and  its  covariance 
using  O(kN)  computation  and  memory.  Unfortunately,  k  increases  linearly  after  each 
frame,  which  makes  the  above  algorithm  O(MN)  where  M  is  the  number  of  frames. 
Though  M  is  usually  much  smaller  than  the  number  of  pixels  N,  it  is  still  impractical 
for  long  image  sequences.  In  next  section,  we  introduce  weighted  principal  component 
analysis  to  keep  k  constant,  and  therefore  achieve  an  O(N)  algorithm. 


3.2  Weighted  Principal  Component  Analysis 


First  of  all,  let  us  consider  the  eigenvalues  and  eigenvectors  of  the  correlated  uncer¬ 
tainty  matrix  U4Vj.  In  general  case,  there  are  I  =  6k  +  6  non-zero  eigenvalues  and 
corresponding  eigenvectors,  which  can  be  computed  easily  as  in  Appendix  B.  Because 
the  outer  product  represents  covariance  which  must  be  symmetric,  it  can  be  expressed 


as 


/ 


u4vi  = 


\ 


ei  e2 


ej 


(  Ax 


A, 


\ 


a,; 


/ 


-T 

el 


-T 


(31) 


where  e,-, i  =  1,2,...,/  are  TV  x  1  eigenvectors,  and  A *  =  1,2,...,/  are  the  corre¬ 
sponding  eigenvalues  ordered  by  magnitude  such  that  \\  is  the  largest  eigenvalue. 

Every  eigenvector  is  an  N  x  1  vector,  which  represents  an  eigen-image.  This 
eigen-image  illustrates  the  pattern  of  the  depth  uncertainty,  and  the  corresponding 
eigenvalue  represents  the  magnitude  of  this  depth  uncertainty.  For  example,  if  the 
eigen-image  is  an  image  with  same  value  at  every  pixel,  the  depth  uncertainty  repre¬ 
sented  by  this  eigen-image  is  that  depth  values  of  all  pixels  can  change  but  only  by  the 
same  amount.  In  other  words,  changes  of  depth  values  allowed  by  this  eigen-image 
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have  to  be  in  the  pattern  specified  by  the  eigen-image: 


where  c  is  a  scalar  constant  and  e  is  the  eigen-image.  The  meaning  of  the  eigenvalue 
is  similar  to  that  of  a  in  a  Gaussian  distribution,  which  represents  the  magnitude  of 
the  uncertainty. 

Since  the  eigenvalues  in  Eq.  31  are  in  descending  order,  and  we  can  truncate  the 
eigenvalues  after  first  k  largest  ones,  i.e. 


Thus  U4  and  V4  can  both  be  reduced  to  N  x  k  matrices.  The  iterations  of  EKF 
updating  illustrated  in  the  previous  section  can  be  carried  out  in  O(iV)  for  every 

frame  no  matter  how  long  the  sequence  is. 

The  underlying  assumption  of  truncating  small  eigenvalues  in  Eq.  33  is  that  the 
uncertainty  implied  by  those  eigenvalues /eigen-images  is  negligible  compared  to  the 
independent  uncertainty  C4.  And  the  reason  for  keeping  large  eigenvalues  is  that  we 
assume  the  uncertainties  implied  by  these  large  eigenvalues  and  their  corresponding 
eigenvectors  are  at  least  comparable  to  the  independent  uncertainty  C4.  But  since 
the  independent  uncertainties  of  pixels  are  not  uniform,  truncating  by  the  magni¬ 
tudes  of  eigenvalues  may  not  make  much  sense  at  all  because  even  though  a  rela¬ 
tively  large  eigenvalue  may  imply  a  large  uncertainty  in  a  certain  area  in  the  eigen 
image,  if  the  independent  uncertainty  happens  to  be  even  larger  in  the  same  area, 
this  eigenvalue/eigen-image  becomes  less  significant. 

Based  on  the  above  speculation,  we  propose  a  weighted  principal  component  anal¬ 
ysis,  i.e.  the  correlated  uncertainty  is  weighted  by  independent  uncertainty  before 
decomposition  as  in  Eq.  31.  Since  the  independent  uncertainty  C4  is  a  diagonal 
matrix,  and  its  diagonal  elements  have  to  be  positive,  we  can  decompose  it  as 
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Gamma  =  4.0 


Original  Image 

Figure  3:  An  Image  Sequence  of  A  Toy  House 

Therefore,  the  overall  uncertainty  can  be  represented  as 

C4  +  U4Vf  =  Q  (l  +  Q-1U4(Q-1V4)t)  Qt.  (35) 

In  other  words,  we  have  weighted  the  correlated  uncertainty  U4  and  V4  by  the  inde¬ 
pendent  uncertainty  Q_1.  We  then  truncate  small  eigenvalues  of  Q_1U4(Q_1V4)T. 

Figure  3  (Original  Image)  shows  a  toy  house  in  front  of  the  camera.  The  uncer¬ 
tainties  of  flow  velocity  in  the  dark  background  are  very  large  comparing  to  those 
of  the  house  area.  There  are  small  intensity  variations  in  the  background,  which  are 
visible  after  Gamma  correction  as  in  Figure  3.  Figure  4  shows  the  six  most  significant 
eigenvalues/eigen-images  of  the  correlated  structure  uncertainty  using  the  direct  de¬ 
composition  method  from  Eq.  33.  The  eigen-images  are  shown  by  linearly  quantizing 
0  to  grey-level  125,  -0.025  or  smaller  values  to  grey-level  0  and  0.025  or  larger  val¬ 
ues  to  grey-level  255.  As  indicated  by  either  high  or  low  grey-levels,  the  uncertainty 
information  captured  in  the  2nd,  3rd,  4th,  5th  and  6th  eigen-images/eigenvalues  is 
mainly  in  the  background  area.  On  the  other  hand,  Figure  5  shows  the  first  six 
weighted  principal  components.  Obviously,  the  weighted  principal  components  carry 
much  more  useful  structural  uncertainty  information. 

Since  eigen-images  represent  orthogonal  patterns  of  possible  change  or  deformation 
of  the  depth  map  in  Eq.  32,  they  also  lend  themselves  for  intuitive  interpretations. 
For  example,  across  the  house  in  the  second  eigen-image  in  Figure  5  there  is  one 
skew  line  of  grey-level  125,  whose  left  side  is  bright  and  right  side  is  dark.  Referring 
to  Eq.  32,  we  can  see  that  this  pattern  represents  a  possible  rotation  around  the 
skew  line.  Other  eigen-images  can  be  similarly  interpreted  though  the  patterns  may 
be  more  complicated.  In  the  context  of  structure  from  motion,  we  believe  that  the 
intrinsic  ambiguity  [2]  of  translation  versus  rotation  of  camera  is  represented  and 
carried  through  recursive  estimations  by  uncertainty  patterns  like  these  as  we  will 
show  in  experiments. 


A4  =  107.1  As  =  48.5  A6  =  27.7 

Figure  4:  Six  Non- Weighted  Principal  Components 


A4  =  0.4026  A5  =  0.1428  A6  =  0.0527 

Figure  5:  Six  Weighted  Principal  Components 
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4  Initial  Motion  Estimation 

Unlike  many  other  approaches  such  as  [3],  we  do  not  assume  continuous  motion.  In 
other  words,  we  assume  that  the  motion  at  the  current  frame  is  totally  unrelated 
to  the  motion  in  the  previous  frame  because  we  believe  that  the  continuous  motion 
assumption  is  unrealistic  in  many  cases  such  as  navigating  on  real  roads,  hand-held 
video  recording,  and  so  on.  Thus,  in  order  to  apply  the  EKF  framework  to  our  prob¬ 
lem,  we  need  to  estimate  an  initial  motion  and  its  a  priori  covariance  for  each  frame. 
Theoretically  we  don’t  need  a  priori  covariance  because  it  should  be  infinitely  large. 
But  in  practice  we  need  a  finite  one  for  numerical  stability  and  re-parameterization. 

First  of  all,  given  a  priori  structure  Z  and  flow  velocity  u,  we  can  estimate  the 
initial  motion  M0  from  Eq.  3  by  a  linear  least  squares  fitting,  e.g.  minimizing 

-  v(xi ,  % n,  Zi,M0))Tr~l{vi  -  v(xi,  t/;,  Z{ ,  Mo)),  (36) 

i— 1 

where  rt  is  the  covariance  of  the  flow  velocity  V{. 

But  we  cannot  use  the  above  minimization  to  estimate  the  covariance  of  Mo  because 
we  didn’t  consider  the  uncertainties  of  structure  Z.  Designating  the  a  priori  depth 
map  as  Za  and  the  depth  computed  from  current  motion  Mo  and  v  by  Eq.  4  as  ZC1 
we  have  the  following  objective  function  to  be  minimized 

N  _  _ 

^2(vi  -  v(xi,yi,  Zi,M0))Tri  1(v,  -  v(x{,yi,  Zi}  M0))  + 

(Za  -  Zcf(C  +  Cd  +  u  VT)~1(Za  -  Zc),  (37) 

where  Za  and  Zc  are  N  x  1  vectors,  C  -fUV^  is  the  structural  uncertainty  of  Za ,  and 
Cd  is  the  depth  uncertainty  caused  by  current  flow  uncertainty  given  M0.  Because 
the  flow  uncertainties  of  different  pixels  are  independent,  is  an  N  x  N  diagonal 
matrix. 

If  we  ignore  the  dependence  of  Cd  on  M0,  the  minimization  of  Eq.  37  can  be 
achieved  using  Levenberg-Marquardt  method  [16].  In  fact,  this  simplification  is  jus¬ 
tifiable  because  Cd  is  usually  insensitive  to  M0.  Once  the  objective  function  is  mini¬ 
mized,  the  curvature  at  the  minimal  value  can  be  used  to  compute  the  covariance  of 

Mo- 


4.1  Dynamic  Motion  Parameterization 

Motion  is  traditionally  parameterized  using  three  translation  parameters  and  three 
rotation  parameters  as  in  Eq.  1  and  Eq.  2.  As  pointed  out  in  [3],  if  the  camera  has 
a  long  focal  length,  the  optical  flow  is  much  more  sensitive  to  translations  in  the  XY 
plane  to  translations  in  the  Z  direction.  Ideally  we  want  the  optical  flow  to  be  equally 
sensitive  to  all  six  motion  parameters  because  otherwise  the  the  covariance  of  motion 
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Cm  in  EQ.  16  could  be  numerically  singular  or  near  singular  and  therefore  ruin  the 
numerical  computation  of  EKF. 

We  introduce  the  concept  of  “dynamic  motion  parameterization”  to  equalize  sen¬ 
sitivities  of  motion  parameters.  There  are  two  sources  of  sensitivity  difference: 

1.  Static  Sensitivity  Difference  is  caused  by  the  camera  configuration.  For  example, 
if  the  camera  has  a  narrow  field  of  view,  the  optical  flow  is  usually  much  more 
sensitive  to  rotation  than  translation. 

2.  Dynamic  Sensitivity  Difference  is  caused  by  current  flow  or  depth  estimate  in¬ 
stead  of  the  camera.  For  example,  if  the  optical  flow  has  uncertainty  much  larger 
in  one  direction  than  others,  the  optical  flow  is  less  sensitive  to  the  motion  which 
caused  optical  flow  in  that  direction. 

If  we  designate  the  covariance  of  M0  computed  from  minimizing  the  objective 
function  of  Eq.  37  as  Cmt,  we  can  normalize  sensitivities  by  using  a  new  set  of  motion 
parameters 

M  =  T-1M0,  (38) 

where 

Cm<  =  TTr.  (39) 

— *  *, 

It  can  be  easily  verified  that  the  covariance  of  the  new  motion  vector  M  is  the  unit 
matrix  I. 

Note  that  we  cannot  use  the  unit  matrix  as  Cm  in  Eq.  16.  Theoretically  the  a 
priori  motion  covariance  Cm  should  be  infinitely  large  due  to  uncorrelated  motion.  In 
estimating  covariance  of  the  motion  in  this  section,  we  have  already  used  the  optical 
flow  information  of  the  current  frame.  Therefore,  it  is  actually  posteriori  motion 
covariance!  Ideally,  we  want  the  a  priori  covariances  to  be  small  enough  to  avoid 
numerical  problems,  and  yet  large  enough  to  not  contain  any  information  about  the 
current  frame.  In  practice,  we  use  10001  as  a  priori  motion  covariance  Cm  because  it 
avoids  the  numerical  problem  of  an  infinitely  large  covariance  and  is  also  large  enough 
(compared  to  posteriori  covariance  I)  to  be  uninformative. 


5  Interpolation  and  Forward  Transformation 

We  represent  the  3D  shape  by  a  depth  map  in  the  current  camera  coordinate  system 
as  in  Figure  1.  Therefore,  we  need  to  transform  the  previous  depth  map  into  the 
current  camera  coordinate  system  and  resample  the  depth  map  according  to  the 
current  sensor  grid.  There  are  two  new  problems  which  were  previously  unsolved: 

•  Though  the  depth  map  and  its  independent  uncertainty  can  be  easily  interpo¬ 
lated  as  in  [15,  12],  the  interpolation  of  correlated  uncertainty  is  a  new  problem. 

•  Most  existing  recursive  structure-from-motion  systems  ignore  the  fact  that  mo¬ 
tion  and  structure  are  actually  correlated  as  Cpp  in  Eq.  25  when  they  rotate  or 
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Figure  6:  Vector  Field  as  Correlated  Uncertainty 


translate  the  structure  according  to  the  motion.  Though  the  exact  effects  of  this 
simplification  are  still  unknown,  our  system  will  perform  a  correlated  translation 
and  rotation. 

As  explained  in  Appendix  B,  since  a  correlated  uncertainty  is  always  a  positive 
definite  symmetric  outer  product,  it  can  be  represented  as 

UVT  =  BBt,  (40) 


where  B  is  an  N  x  k  matrix  just  like  U.  In  other  words,  every  row  of  B  is  a  vector  of 
length  k  that  can  be  regarded  as  an  attribute  of  the  corresponding  pixel.  Therefore 
we  can  represent  the  correlated  uncertainty  as  a  vector  field  as  in  Figure  6.  Further 
more,  the  correlated  uncertainty  between  any  two  locations  is  the  dot  product  of 
the  vectors  at  the  two  locations.  Interpolating  the  correlated  uncertainty  is  done  by 
interpolating  this  vector  field. 

Since  the  optical  flow  establishes  the  correspondence  between  two  adjacent  frames, 
we  can  interpolate,  resample  and  transform  the  depth  map  represented  in  the  pre¬ 
vious  camera  coordinate  such  that  we  have  the  depth  and  uncertainty  information 
for  grid  positions  in  the  current  frame.  We  designate  this  process  as  the  “forward 
transform”.  For  every  pixel  position  in  the  current  frame,  suppose  its  correspondence 
in  the  previous  frame  is  at  location  (x{,  i /,)  in  the  image  plane,  and  has  depth  Zi  in 
the  previous  camera  coordinate,  we  can  compute  the  depth  Zf  in  the  current  camera 
coordinate  using  the  motion  parameters,  e.g. 

Zf  =  f(xi,yt,Zi:M0)  =  f(xi,yt,Zi:TM),  (41) 


where  f  represents  3D  rotation  and  translation  function.  The  3D  transformation 
matrix  can  be  found  in  [7]  (page  52). 

To  compute  the  structural  uncertainty  in  the  new  camera  coordinate,  we  take  the 
derivative  of  Eq.  41 


dZf 


dZi  +  ■  dM 

dZi  dM 

ciidZi  H-  bf  dM 5 


(42) 
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where  bt  is  a  vector  of  length  six.  Thus  we  have  the  covariance  between  two  arbitrary 
points  as 


E[dZfdZf)  =  aidjE[dZidZj]  +  E[dZidM]  + 
a,jb[  E[dZjdM]  +  b[E[dMdMT]br 


From  Eq.  25,  we  know  that 


Cmp  =  E[dMdMT ], 

(  E[dZidM]T  \ 
E[dZ2dM)T 


Cpp  — 


(43) 

(44) 

(45) 


\  E[dZNdM]T  J 

Therefore,  we  have  the  structural  uncertainty  after  the  forward  transform  as 


E 


(  dZt  \  (  dZt  \ 

dZ+ 
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\  dZ%  ) 


\  dZ%  ) 


C+  +  U+(V+)T, 


(46) 


where 
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(  aiuf  aip{  bj  CmpbJ  ^ 
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^  o-nuJj  awpjv  bjj  Crnpbjr  J 
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(51) 
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in  which  «,•’  s  and  Fi’s  are  vectors  of  length  k  and  p^s  are  of  length  six.  Since  U+  and 
V+  are  now  N  x  (k  +  18)  matrices,  they  may  also  be  reduced  to  N  x  k  by  weighted 
principal  component  analysis. 


6  Implementation  Issues  and  Experiments 

6.1  Implementation  Issues 

We  implemented  our  system  using  single  precision  matrices.  As  always  in  manipulat¬ 
ing  large  matrices,  the  numerical  stability  has  to  be  carefully  watched  while  carrying 
out  those  computations.  Potentially  there  are  following  sources  of  unstable  compu¬ 
tations: 

1.  Matrix  Multiplication:  When  computing  the  inner  product  of  two  large  matrix 
VTU,  where  both  U  and  V  are  N  x  k  (k  «  N),  we  have  to  carry  those 
additions  in  double  precision  due  to  large  N.  For  example,  if  the  image  is 
256  x  256,  we  can  easily  lose  four  significant  digits  during  multiplications,  which 
could  be  devastating  if  they  are  carried  out  in  only  single  precision. 

2.  Ill-conditioned  Matrix:  There  is  always  a  danger  when  one  of  the  matrices  in 
the  computation  is  singular  or  near  singular.  In  the  worst  case,  we  may  lose  all 
significant  digits.  Thus  it  is  extremely  helpful  to  avoid  any  ill-conditioned  matrix 
if  possible.  In  our  system,  we  pay  special  attention  to  the  following  matrices: 

•  Flow  Uncertainty:  The  estimated  optical  flow  uncertainty  r,-’s  in  Eq.  13. 

•  Motion  Uncertainty:  We  used  dynamic  motion  parameterization  to  prevent 
the  motion  covariance  matrix  from  being  ill-conditioned. 

•  Kalman  Gain:  In  computing  the  Kalman  gain  as  in  Eq.  8,  it  is  numeri¬ 
cally  devastating  if  HP_Hr  +  R  is  ill-conditioned.  HP_Hr  represents  the 
projection  of  the  uncertainty  of  motion  and  structure  to  the  uncertainty  of 
optical  flow.  In  order  for  HP_HT  +  R  to  be  well  conditioned,  we  need  to 
make  sure  that  the  projected  uncertainty  of  optical  flow  is  not  significantly 
larger  (>  105)  than  the  estimated  uncertainty  R.  That  is  the  reason  we 
choose  10001  as  the  a  priori  motion  uncertainty. 
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3.  Sherman-Morrison-Woodbury  Inversion:  Though  Sherman-Morrison- Woodbury 
inversion  significantly  reduces  the  amount  of  computation  and  memory  required 
compared  to  the  traditional  Gaussian  elimination  ([16]),  it  has  the  disadvantage 
of  being  more  fragile  numerically  ([10,  4],  Appendix  A).  From  our  experience, 
the  eigenvalues  of  UVT  have  to  be  at  most  104  of  the  eigenvalues  of  C  to  allow 
a  stable  numerical  inversion  of  C  +  UVT  using  Sherman-Morrison-Woodbury 
formula. 

Another  common  problem  of  a  structure  from  motion  system  is  the  handling  of 
the  disappearance  and  reappearance  of  parts  of  the  scene  due  to  relative  movement 
between  the  camera  and  the  scene.  Our  system  had  no  problem  dealing  with  new 
parts,  which  are  simply  assigned  a  preset  large  independent  uncertainty  and  a  zero 
correlated  uncertainty.  But  the  depth  information  of  disappearing  parts  is  discarded. 
In  the  future,  we  would  like  to  maintain  a  global  shape  module  such  that  the  structure 
information  of  disappeared  parts  could  be  stored  and  retrieved. 

6.2  Experiments 

We  tested  our  system  on  real  image  sequences  taken  by  a  Sony  XC-75  video  cam¬ 
era.  The  relative  camera  movements  in  all  the  sequences  involve  both  rotation  and 
translation.  We  digitized  images  in  two  ways.  One  is  to  digitize  by  matrox  board 
while  shooting  the  sequence.  In  order  to  digitize  while  taking  images,  we  mount  the 
camera  on  a  computer-controlled  6  DOF  platform  in  Calibrated  Imaging  Lab,  and 
stop  for  every  frame.  Another  way  is  to  record  the  sequence  on  Umatic  SP  video  tape, 
and  digitize  the  tape  frame  by  frame.  Unfortunately,  the  digitizing  device  we  have 
can  only  digitize  one  of  two  fields  in  every  frame,  and  the  videotape  also  introduces 
additional  noise  in  the  images.  We  will  demonstrate  the  performance  of  our  system 
on  image  sequences  digitized  both  ways.  All  images  in  our  experiments  are  480  x  512. 

6.2.1  Ambiguities 

It  is  well  known  that  there  are  intrinsic  ambiguities  in  recovering  structure  from 
motion.  The  first  kind  of  ambiguity,  i.e.  the  scale  ambiguity,  states  that  the  scale  of 
the  object  or  the  absolute  depth  of  the  object  can  never  be  recovered.  Secondly,  if  the 
camera  has  a  small  field  of  view,  the  optical  flow  caused  by  a  small  camera  rotation  is 
very  similar  to  that  caused  by  a  small  camera  translation.  Therefore,  given  an  optical 
flow,  there  is  a  rotation/translation  ambiguity.  Thirdly,  since  the  optical  flow  has  its 
uncertainty,  we  will  always  have  uncertainty  in  estimating  other  motion  parameters 
such  as  rotation  and  translation  around  0  axis  though  they  are  usually  less  significant. 
We  also  like  to  point  out  that  there  is  no  fundamental  difference  in  terms  of  origin 
between  the  second  and  the  third  kinds  of  ambiguities  other  than  their  magnitudes 
for  an  ordinary  camera.  Historically  the  second  kind  of  ambiguity  was  frequently 
singled  out  in  literature. 

In  our  system,  we  assign  an  initial  depth  and  uncertainty  to  the  first  frame.  Prac¬ 
tically  we  assign  a  flat  depth  map  and  uniform  independent  uncertainty  as  a  priori 


17 


depth  information.  It  serves  two  purposes,  which  are  disambiguation  of  the  scale  am¬ 
biguity  by  providing  absolute  depth,  and  allowing  deformation  of  the  a  priori  depth 
map  to  the  true  depth  map  by  providing  large  independent  uncertainty. 

Our  system  keeps  six  principal  eigenimages  to  represent  the  correlated  uncertainty 
as  shown  in  Figure  5.  Among  these  eigenimages,  the  first  one  usually  represents  the 
first  kind  of  ambiguity,  i.e.  the  scale  ambiguity.  The  second  and  the  third  ones  usually 
represent  the  second  kind  of  ambiguity  in  two  orthogonal  directions.  And  the  rest 
ones  represent  other  minor  ambiguities. 

Conceptually  the  independent  uncertainty  represents  a  chaotic  uncertainty  pat¬ 
tern,  while  the  correlated  uncertainty  represents  an  organized  uncertainty  pattern. 
For  example,  if  the  eigenimage  of  a  correlated  uncertainty  is  uniformly  bright,  it  rep¬ 
resents  that  the  corresponding  depth  map  can  move  back  and  forth.  In  other  words, 
the  depth  values  of  all  pixels  have  to  change  uniformly  while  the  shape  doesn’t  change 
at  all.  Since  we  set  a  priori  depth  uncertainty  as  totally  chaotic,  we  will  expect  that 
as  more  optical  flow  information  is  incorporated,  the  uncertainty  will  become  less  and 
less  chaotic,  more  and  more  organized.  In  our  framework,  that  means  that  magnitude 
of  the  independent  uncertainty  will  decrease  while  the  magnitude  of  the  correlated 
uncertainty  could  increase. 

Optical  flow  information  doesn’t  provide  anything  which  we  could  use  to  eliminate 
the  scale  ambiguity.  Therefore  we  expect  the  eigenimage  representing  scale  ambiguity 
in  correlated  uncertainty  will  have  larger  and  larger  eigenvalue.  On  the  other  hands, 
the  second  and  third  kinds  of  ambiguities  are  strong  in  some  optical  flows  while 
weak  in  other  ones.  Thus  the  eigenimages  representing  these  ambiguities  can  have 
increasing  or  decreasing  eigenvalues  depending  on  the  optical  flow  sequence. 

Figure  7  shows  one  frame  in  a  fifty-frame  sequence.  The  motions  of  the  camera 
with  respect  to  the  straw  hat  involve  translations  in  (X,  Y,  Z)  three  directions  and 
rotations  around  (X,  Y)  two  axis  from  the  1st  frame  to  the  30th  frame.  From  the 
30th  frame  to  the  40th  frame,  the  motions  are  translations  in  Y  direction  and  small 
rotations  around  X  axis2.  From  the  40th  frame  to  the  50th  frame,  the  motions  are 
translations  in  X  direction  and  small  rotations  around  Y  axis.  Figure  8  shows  the 
first  three  eigenimages,  and  Figure  9  shows  the  evolutions  of  average  independent 
uncertainty  and  the  eigenvalues  corresponding  to  the  three  eigenimages.  Note  that 
the  eigenimages  change  from  frame  to  frame.  In  the  examples  shown  here,  the  eigen¬ 
images  didn’t  change  dramatically  over  the  whole  sequences,  which  simplifies  the 
analysis  of  correlated  uncertainties.  First  of  all,  the  fact  that  the  average  indepen¬ 
dent  uncertainty  decreases  monotonically  and  the  first  eigenvalue  which  represents 
the  scale  ambiguity  increases  monotonically  indicates  a  steady  improvement  from  a 
chaotic  pattern  to  an  organized  pattern.  Secondly,  in  the  interval  between  the  30th 
and  the  40th  frame,  there  is  an  accumulating  ambiguity  of  translation  in  Y  direction 
versus  rotation  around  X  axis.  This  motion  ambiguity  mapping  into  structural  uncer¬ 
tainty  as  generally  increasing  third  eigenvalues.  And  because  there  is  no  ambiguity 
of  translation  in  X  direction  versus  rotation  around  Y  axis,  the  second  eigenvalue  de- 

2X  is  the  column  direction,  and  Y  is  the  row  direction 
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Figure  7:  One  Frame  in  A  Strawhat  Sequence 


First  Second  Third 


Figure  8:  Three  Eigenimages  For  the  Strawhat  Sequence 

creases  in  the  same  time.  For  the  similar  reason,  in  the  interval  between  the  40th  and 
the  50th  frame,  the  second  eigenvalue  increases  while  the  third  eigenvalue  decreases 
for  the  extactly  opposite  reason. 

If  the  underlying  camera  motions  or  the  optical  flows  of  the  whole  sequence  tend 
to  be  rather  homogeneous,  the  system  may  never  be  able  to  resolve  one  or  more 
ambiguities  intrinsic  to  this  type  of  optical  flows.  Under  this  case  the  correlated 
uncertainty  eigenimages  representing  the  second  kinds  of  ambiguity  may  have  an 
ever  increasing  eigenvalues,  which  represent  lack  of  information  to  disambiguate.  If 
we  have  active  control  over  the  camera,  the  eigenimages  could  then  be  used  to  plan 
the  camera  motion  in  order  to  resolve  the  ambiguities. 

Figure  10  (1)  shows  one  frame  of  a  road  sequence  which  was  shoot  using  a  camera 
fixed  on  a  moving  vehicle.  The  motion  of  the  camera  with  respect  to  the  scene  was 
very  homogeneous.  In  fact  all  the  optical  flows  are  similar  to  the  one  in  Figure  10 
(2).  Figure  11  shows  the  first  three  eigenimages  representing  the  correlated  uncer¬ 
tainty.  Figure  12  shows  the  average  independent  uncertainty  and  the  eigenvalues 
corresponding  to  the  three  eigenimages.  In  this  example,  we  can  see  that  the  ambigu- 
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Figure  10:  A  Road  Sequence 


Figure  11:  The  First  Three  Eigenimages  of  the  Road  Sequence 


ity  represented  by  the  second  eigenimage  was  never  resolved.  Comparing  the  optical 
flow  in  Figure  10,  it  is  obvious  that  these  optical  flows  provided  little  information 
to  resolve  the  second  kind  of  ambiguity  in  the  flow  direction,  i.e.  the  direction  from 
bottom-left  to  up-right,  while  they  did  provide  enough  information  to  resolve  the  sec¬ 
ond  kind  of  ambiguity  in  the  direction  perpendicular  to  the  flow  direction.  Secondly, 
unlike  the  previous  example,  the  magnitude  of  the  average  independent  uncertainty 
and  the  first  eigenvalue  were  approaching  constants.  In  other  words,  after  about  20 
frames,  the  improvement  from  the  chaotic  uncertainty  pattern  to  an  organized  uncer¬ 
tainty  pattern  seems  stopped.  The  reason  is  that  in  this  example,  there  is  continuous 
appearing  of  new  parts  which  were  assigned  chaotic  uncertainties  and  disappearing 
of  old  parts  whose  uncertainties  were  organized.  Therefore  after  certain  number  of 
frames,  the  improvement  from  chaotic  to  organized  uncertainty  pattern  obtained  in 
each  new  frame  was  totally  cancelled  by  the  introduction  of  new  parts  and  lose  of  old 
parts. 
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Figure  12:  Evolutions  of  Road  Uncertainties 

6.2.2  Experiments  on  Real  Sequences 

We  tested  our  system  on  many  image  sequences  with  different  signal  noise  ratio  and 
different  field  of  view.  No  pre-processing  was  done  on  image  sequences.  Once  we  ob¬ 
tained  a  depth  map  sequence  as  output  from  our  system,  we  masked  out  baskground 
areas  since  the  depth  information  in  these  areas  is  arbitrary.  The  separation  of  fore¬ 
ground  and  background  was  done  by  a  simple  thresholding  and  hole-filling. 

The  first  sequence  includes  fifty-one-frame  images  of  a  straw  hat  as  we  showed  in 
the  previous  section  The  images  were  digitized  by  a  matrox  board.  The  rotations 
and  translations  of  the  camera  with  respect  to  the  straw  hat  were  discontinuous. 
The  camera  had  about  an  11°  field  of  view.  The  optical  flow  and  its  uncertainty 
were  computed  using  hyper-geometric  filters  [21,  22].  Figure  13  shows  the  intensity 
images  and  depth  maps  computed  after  the  corresponding  frames.  It  clearly  shows 
the  converging  shape  resulting  from  recursively  combining  information  from  multiple 
frames. 

The  Chair  Sequence :  The  camera  had  an  22°  field  of  view.  The  object  was  a  real 
chair  we  used  in  our  lab.  The  chair  was  rotating  in  front  of  the  static  camera  as  in 
Figure  14.  Digitization  was  done  by  matrox  board.  The  optical  flow  computation 
sometimes  returned  wrong  results  at  some  locations,  which  we  believe  were  caused 
by  texture  aliasing.  We  can  see  that  even  in  the  tenth  frame,  the  two  buttons  are 
very  clear  in  the  depth  map.  Also  noticeable  is  that  part  of  the  chair  in  the  left  side 
is  moving  out  and  part  of  the  chair  in  the  right  side  is  moving  in.  The  move-in  part 
is  at  first  pretty  noisy,  and  then  gradually  becomes  smoother  and  smoother. 

The  Cube  Sequence:  The  camera  had  about  a  22°  field  of  view.  The  sequence  in 
Figure  15  was  taken  by  a  hand-held  video  camera  connected  to  a  Umatic  recorder. 
It  was  recorded  on  a  Umatic  SP  videotape  and  then  digitized  by  a  BVU  digitizer, 
which  could  only  capture  one  field  in  a  frame.  The  digitized  images  have  significantly 
higher  noise  levels  than  those  digitized  by  matrox  board.  We  can  see  that  the  system 
still  performs  pretty  well  on  those  noisy  images. 

The  Basket  Sequence:  The  camera  had  a  22°  field  of  view.  The  target  was  a  basket 
which  moving  and  rotating  in  front  of  the  camera.  The  digitization  was  done  the  same 
way  as  the  cube  sequence. 

The  Sphere  and  Dog  Sequence:  The  camera  had  an  11°  field  of  view.  The  target 
was  a  toy  dog  on  top  of  a  ball.  The  difficulties  of  this  sequence  are  that  (1)  the  ball 
had  a  very  low  contrast  near  its  boundaries;  (2)  it  had  obvious  specular  reflections 
which  will  confuse  the  optical  flow  algorithm,  and  (3)  the  toy  dog  had  a  sparse  texture. 
Despite  these  difficulties,  our  system  still  performs  reasonably  well  as  in  Figure  17. 
The  shape  of  the  sphere  and  dog  are  both  visible,  and  even  the  depth  of  the  tail  tip 
of  the  dog  is  correctly  shown. 

In  all  the  experiments,  the  initial  structure  information  was  set  as  a  flat  surface 
parallel  to  the  image  plane  at  depth  200  with  independent  uncertainty  a  =  100  at 
every  pixel.  Despite  such  a  crude  initial  estimation,  there  was  no  trouble  caused  by 
nonlinearity  in  our  experiments. 

From  all  these  experiments,  we  conclude  that  our  system  of  recursively  recovering 
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Second  Frame  Depth  Map  After  2nd  Frame 


10th  Frame  Depth  Map  After  10th  Frame 


50th  Frame  Depth  Map  After  50th  Frame 


Figure  13:  The  Straw  Hat  Sequence 
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Second  Frame  Depth  Map  After  2nd  Frame 


10th  Frame  Depth  Map  After  10th  Frame 


50th  Frame  Depth  Map  After  50th  Frame 


Figure  14:  The  Chair  Sequence 


Second  Frame 


10th  Frame 


50th  Frame 


Depth  Map  After  10th  Frame 


Depth  Map  After  50th  Frame 


Figure  15:  The  Cube  Sequence 


Second  Frame  Depth  Map  After  2nd  Frame 


20th  Frame  Depth  Map  After  20th  Frame 


90th  Frame  Depth  Map  After  90th  Frame 


Figure  16:  The  Basket  Sequence 


Second  Frame  Depth  Map  After  2nd  Frame 


10th  Frame  Depth  Map  After  10th  Frame 


50th  Frame  Depth  Map  After  50th  Frame 


Figure  17:  The  Sphere  and  Dog  Sequence 
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dense  structure  from  a  dense  optical  flow  sequence  can  converge  to  the  true  3D  shape 
of  the  scene  quickly  and  accurately.  We  have  demonstrated  its  performance  under 
adverse  conditions  such  as  noisy  images,  specularity,  and  texture  aliasing.  Even  under 
these  conditions,  the  system  performed  robustly. 


7  Summary 

In  summary,  we  presented  an  EKF-based  system  which  recursively  combine  dense 
structural  information  from  a  sequence  of  optical  flows.  At  current  stage,  our  system 
is  able  to  deliver  an  evolving  sequence  of  depth  maps  using  optical  flows.  We  also 
showed  that  the  system  was  very  robust  when  the  optical  flows  were  noisy  or  contain 
outliers  caused  by  texture  aliasing  and  specularity. 

Current  representation  of  3D  dense  information  by  depth  maps  and  their  uncer¬ 
tainty  is  very  limiting  in  that  complicated  objects  can  not  be  represented.  In  the 
future,  we  would  like  to  expand  our  system  to  deliver  a  final  3D  model  of  the  scene 
based  on  the  image  sequence.  In  other  words,  we  would  like  to  maintain  an  indepen¬ 
dent  module  to  store,  retrieve  and  update  3D  structural  information.  Therefore  we 
could  extract  a  priori  depth  information  from  the  module  for  every  optical  flow  frame, 
and  merge  posteriori  depth  information  into  the  module.  The  problem  of  representing 
3D  dense  structure  and  its  uncertainty  (independent  and  correlated)  still  remains  to 
be  very  challenging. 
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A  Sherman-Morrison- Woodbury  Inversion 

Given  a  full-rank  nxn  matrix  C,  and  its  perturbed  by  a  rank  m  matrix  UVT,  where 
U  and  V  are  both  n  x  m  matrices,  the  Sherman-Morrison- Woodbury  formula  [9] 
(page  225)  states  that 

(c  +  uvT)-1  =  cr1  -  c-1u(im  +  vTc-1u)-1vTc-1,  (53) 

where  Im  is  the  m  x  m  unit  matrix.  The  validity  of  this  inverse  can  be  easily  verified 
by  multiplying  both  sides  by  (C  +  UVT). 

In  a  more  concise  format,  we  can  write  the  above  equation  as 

(C  +  UVr)-1  =  Ca  +  UxVf,  (54) 

where 

Ci  =  C"1,  (55) 

Ui  =  -C-1U(Im+VTC-1U)-1,  (56) 

Vi  =  C~TV.  (57) 

In  our  application,  because  C  is  a  diagonal  matrix,  we  can  compute  its  inverse  Ca 
accurately.  Therefore  the  only  source  of  numerical  error  is  A  =  (Im  +  VTC-1U)-1. 
Suppose  the  error  is 

E  =  Im  —  (Im  +  VtC-1U)A,  (58) 

the  final  error  is 

(C  +  UVr)(Cx  +  UxVf)  -  I  =  UEVtC-1.  (59) 

Eq.  59  shows  that  there  is  a  potential  danger  that  the  error  E  could  be  magnified 
in  the  final  error.  That  is  the  source  of  fragility  of  Sherman-Morrison- Woodbury 
formula.  In  our  system,  we  reduce  that  risk  by  computing  A  is  in  double  precision 
and  limiting  the  magnification  factor  (roughly  the  ratio  between  eigenvalues  of  UVT 
and  those  of  C)  to  be  less  than  104  as  we  did  in  Section  6. 
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B  Eigen  Analysis  of  Symmetric  Outer  Products 


In  this  section,  we  consider  eigen  analysis  and  singular  value  decomposition  of  a 
special  kind  of  matrices,  i.e.  symmetric  outer  products  of  two  low-rank  matrices. 
Because  those  matrices  are  symmetric,  the  problem  of  computing  eigenvalues  and 
eigenvectors  is  identical  to  the  problem  of  singular  value  decomposition  because 
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where  U  and  V  are  both  nxm  (n  »  m)  matrices,  A,-,  *  =  1,2, 
and  &i ,  i  —  1, 2,  •  •  • ,  m  are  normalized  eigenvectors. 
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Theorem  B.l  Suppose  A =  1,2,-*-,  m  and  e?-,  i  —  1,2, 
and  normalized  eigenvectors  of  the  m  xm  matrix  VTU,  we  have  the  eigenvalues  and 
eigenvectors  of  n  x  n  matrix  UVT  as 


A  <  =  A?, 
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where  II  Ue”  ||  is  the  norm. 

The  proof  of  the  above  theorem  is  straightforward.  Since  A°  and  e?  are  an  eigen¬ 
value  and  eigenvector  of  VrU,  we  have 

VrUe?  =  A°e?.  (63) 


Multiplying  both  sides  by  U,  we  obtain 


(UVT)Ue?  =  A°Ue?. 


(64) 


Thus  we  have  the  eigenvalue  and  eigenvector  of  UVT  as  A?  and  Ue?. 

Additionally,  if  the  outer  product  UVT  is  also  positive  semi-definite,  which  is  true 
if  it  represents  covariance,  we  can  rewrite  it  as 
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